The diversity index for Chicago

Read in demographic data

Read in only the fields from data/il2020.pl_COMBINED_TRACT.csv that are needed for calculating the diversity index, and use the function to calculate the field.

The diversity index indicates how likely you are to pick people of two different races when you pick two people from a population. It’s calculated by calculating the propbability that you don’t pick two people of the same race.

To calculate race in this context the variables that break out population by Hispanic / Non-Hispanic are used, and for simplicity of calculation the (relatively small) multiracial population is all treated as a single race. This simplifying assumption has little effect on the result, and makes the index much easier to compute. Plus, since it’s consistently computed, it’s reasonable to use as a comparison between regions.

Relevant fields for Diversity Index:

Other header fields: GEOID, GEOCODE, STATE, COUNTY, SUMLEV, TRACT, BLKGRP, BLOCK, CSA, BASENAME, POP100, HU100, INTPTLAT, INTPTLON

Formula for Diversity Index: 1 - ((H/TOT)^2 + (W/TOT)^2 + (B/TOT)^2 + (AIAN/TOT)^2 + (ASIAN/TOT)^2 + (NHPI/TOT)^2 + (SOR/TOT)^2 + (MULTI/TOT)^2)

The logic for these computations is encapsulated in the R functions in R/functions.

pl_2020_tract <- "data/il2020.pl_COMBINED_TRACT.csv" %>% 
    fread_census %>%
    di_race_var_table
pl_2020_tract$DI <- diversity_index(pl_2020_tract)
pl_2020_tract
##       STATE COUNTY  TRACT      GEO_ID LSAD_NAME  TOT   H    W   B AIAN ASIAN
##    1:    17    001 000100 17001000100        NA 4644  93 4231 117    9    42
##    2:    17    001 000201 17001000201        NA 2067  38 1827  79    4    18
##    3:    17    001 000202 17001000202        NA 2870  98 2393 183    8    23
##    4:    17    001 000400 17001000400        NA 3793  67 2953 528   11     3
##    5:    17    001 000500 17001000500        NA 1719  31 1366 198    3     4
##   ---                                                                       
## 3261:    17    203 030501 17203030501        NA 7842 200 7194  55    3    94
## 3262:    17    203 030502 17203030502        NA 2387  57 2176   9    5    12
## 3263:    17    203 030601 17203030601        NA 6324 132 5835  93    2    33
## 3264:    17    203 030602 17203030602        NA 3597  45 3426   9    8    22
## 3265:    17    203 030700 17203030700        NA 4532  87 4229  54    2    12
##       NHPI SOR MULTI         DI
##    1:    7   9   136 0.16797006
##    2:    0  13    88 0.21500863
##    3:    0   4   161 0.29632847
##    4:    3  23   205 0.37121916
##    5:    2  14   101 0.35141378
##   ---                          
## 3261:    2  19   275 0.15635679
## 3262:    2   0   126 0.16557604
## 3263:    1  15   213 0.14685054
## 3264:    0  14    73 0.09218707
## 3265:    0   3   145 0.12770402

A histogram of the index: